Generating flexible proper name references in text: Data, models and evaluation
نویسندگان
چکیده
This study introduces a statistical model able to generate variations of a proper name by taking into account the person to be mentioned, the discourse context and variation. The model relies on the REGnames corpus, a dataset with 53,102 proper name references to 1,000 people in different discourse contexts. We evaluate the versions of our model from the perspective of how human writers produce proper names, and also how human readers process them. The corpus1 and the model2 are publicly available.
منابع مشابه
Towards proper name generation: a corpus analysis
We introduce a corpus for the study of proper name generation. The corpus consists of proper name references to people in webpages, extracted from the Wikilinks corpus. In our analyses, we aim to identify the different ways, in terms of length and form, in which a proper names are produced throughout a text.
متن کاملDesigning a Bank-Based Flexible Performance Evaluation System (Study: Bank Shahr)
Given the limitations of the existing performance evaluation models for organizations with dynamic internal and external conditions, this study aims to provide a flexible performance evaluation model with adaptability to intra- and extra-organizational changes. The present study first forms a database of criteria related to banking activities. After gathering the experts' opinions, we select 2...
متن کاملIntroducing of Dirichlet process prior in the Nonparametric Bayesian models frame work
Statistical models are utilized to learn about the mechanism that the data are generating from it. Often it is assumed that the random variables y_i,i=1,…,n ,are samples from the probability distribution F which is belong to a parametric distributions class. However, in practice, a parametric model may be inappropriate to describe the data. In this settings, the parametric assumption could be r...
متن کاملIdentifying Unknown Proper Names In Newswire Text
The identification of unknown proper names in text is a significant challenge for NLP systems operating on unrestricted text. A system which indexes documents according to name references can be useful for information retrieval or as a preprocessor for more knowledge intensive tasks such as database extraction. This paper describes a system which uses text skimming techniques for deriving prope...
متن کاملGenerating proper name pro for automatic speech
Generating correct pronunciation of proper names remains one of the most difficult tasks in text-to-phoneme transcription. Although phonetic rules can be efficient in processing proper names of one language, foreign family names cannot be always correctly generated without additional pronunciation rules. The present study addresses the problem of pronunciation variants for French and foreign fa...
متن کامل